Lab 2: Hyperspectral image analysis

ISAT, EPITA

Information on the lab session

Introduction

In this lab you will work directly on this notebook. You will be required to:

You can work in groups of two.

Work during the session

Submission

The submission is done by sending the compressed folder to mauro.dalla-mura@gipsa-lab.grenoble-inp.fr. One submission per group.

The submission can take place in two steps.

  1. [Mandatory] Submit your work at the end of the session.
  2. [Optional] If you want to answer to additional questions (e.g., optional ones), you can submit a second version within one week. This second version will give up to 5 points wrt to the first submission.

1. Introduction

The purpose of this lab session is to get familiar with some common techniques for the analysis of remote sensing hyperspectral images. We will analyze a hyperspectral image acquired on Washington DC Mall, Washington, DC, United States (https://goo.gl/maps/prYNey1N5z81ZtcE7). More information on the data are available in the Appendix of this notebook.

Tip: use the Summary to move into the notebook, to collapse or show cells, etc.

1.1 Outline of the practical section

This notebook is divided into different parts composing a possible experimental analysis:

1.2 Needed packages

This cells contains all the packages and functions that are needed for executing all the cells of this notebook. Please feel free to add new import statements to customize the code and to make new additional experiments.

1.3 Random seed setting

Throughout the notebook, we will use some random generators. You may want to fix the seed in input to the PRNG* in order to have reproducible results throughout several trials.

* Pseudo-Random Number Generator: a deterministic algorithm for generating a pseudo-random sequence of numbers from a seed. By fixing the seed, you will obtain the same sequence once restarted the algorithm.

1.4 Data loading and preparation

The variable data will contain the hyperspectral image.

Check the size of the image (num rows, num cols, num bands) by typing data.shape. The number of rows, columns and bands are stored in the variables n_rows, n_columns and n_bands, respectively.

2. Data visualization

This part of the session is devoted to the visualization and inspection of the hyperspectral image acquired over Washington DC Mall.

* N.W.: this instruction should be followed by plt.show().

2.1 Questions

From the file data/wavelengths.txt:

The false color compositions looks overexposed comparing to the true color compositions.

3. Exploratory analysis

In this part you will perform some simple data inspections in order to gain more insights about the data.

3.1 Band inspection and analysis

  1. Look at the different bands. You can visualize a single band b by typing: imshow_stretch(data[:,:,b]). The central wavelenght corresponding to each band is reported in the file data/wavelengths.txt.
    • What can you state? Are the bands significantly different one to the other? Are they “equally informative”?

atmwin

  1. Take a look at the figure above.
    • Can you see any correspondence between the spectrum of the atmospheric absorption and how the data looks like at the different wavelengths?
      • atmospheric bands in [300nm, 1000nm]
      • Slices [0, 78]

We notice that the absorption range of atmosphere [400-1000]nm has an impact on the reflectance of a pixel.

3.2 Inspection of the spectral signatures of some materials

Show a single band or a RGB composite of the image. You are encouraged to create a new figure by typing plt.figure() and to play with its input parameters figsize and dpi for having a larger (smaller) image and a higher (lower) resolution, respectively.

Move the cursor on the image. The top left corner is (0, 0) and the bottom right corner is (n_rows, n_columns).

plt.plot(data[row_1, col_1, :], 'b', label='pixel 1')
plt.plot(data[row_2, col_2, :], 'g', label='pixel 1')
# ...
plt.legend()
plt.show()

After 2000nm, the energy is the same for each class

3.3 Analysis of band correlation

In order to check the correlation among bands, we look at the correlation matrix of the data, which shows the cross-correlation between pairs of bands. We have already provided you with the code for its calculation and display. The colorbar at the right side shows the Pearson correlation coefficient $\rho$. Small values denote low correlation, while large values high correlation.

The bands [0-50] are heavily correlated with each other while the bands [50-100] are also correlated with each other. On the other hand, [0-50] and [50-100] are absolutely not correlated.

3.4 Data range dynamics

Check the dynamic of the data (range of values) by looking at the standard deviation of each band. We have already provided you with the code for its calculation and display.

The most "informative" bands are the one with the greatest standard deviation.

4. Dimensionality reduction

In this part, you are asked to analyze how a reduction of the data dimensions (initially, there are 191 bands) can still retain most of the information linked to the data variability and under which conditions this may be possible.

You will make a Principal Component Analysis (PCA). Roughly speaking, the PCA is a linear transformation which reprojects the data in another space which axes are the direction of maximum variability. Therefore, the intuition behind this is that the transformation will retain the main variability (information) of the data.

In addition, given $N$ the dimensionality of the data, it is possible to select the first $ k \ll N$ principal components (which correspond to the eigenvectors associated to the $k$ largest eigenvalues of the covariance matrix of the data) for obtaining a representation of the data which retains as much variability as possible.

We have already provided you with the code for its calculation, but please take a little bit of time to analyze it and understand how the PCA works.

N.W.: this version of the PCA code is far unefficient with respect to ones that you can find in standard statistical libraries. In fact, the calculation of the covariance matrix become harder and harder as the dimensionality increases. Efficient methods overcome this calculation by recalling some properties which link the data to the eigenvalues and eigenvectors of the covariance matrix.

Inspect the eigenvalues (e.g., plot(S[:k]) for the first k eigenvalues). You will find out that the eigenvalues are sorted by decreasing value (the same sorting goes for the eigenvectors U). You can show that the sum of the eigenvalues is equal to the trace of the covariance matrix, that is the total variance (variance preservation).

3 eigenvectors/eigenvalues can explain 99.5% of the data variance.

Show the principal components. In order to show them you need to reorganize the data in matrix format (e.g., PC1 = Dp[:, 0].reshape((n_rows, n_columns)) for the first PC).

The last eigenvalues has a very low range of value (very low standard deviation). The few values does make some sense. There are shapes of building. So the smallest eigemvalues does not necessarily noise but for sure holds a very few information.

5. Hyperspectral classification

This part is devoted to perform a classification of the data. In particular, by relying on the spectral information of the bands, we want to set a "supervised" learning task: we will have a training set consisting of pixels and labels, where each label is going to describe the material of that pixel. The goal will be to train a classifier by using <pixel, label> pairs, such that it learns how to classify the material of a pixel by relying on its spectral features. The classifier will be then tested on a test set for assessing its performance.

5.1 Training and test set creation

We will take 40 randomly chosen samples from the labeled set for each class in order to perform the training. The remaining samples will be used for the test. For creating the two sets run the following cell.

5.2 Support Vector Machine (SVM) classification

You will perform the training of the SVM on the training set and run the inference (classification) on the whole data, while you will evaluate the performance only on the test data.

The intuition behind the Support Vector Machine is the so-called "maximum margin classification". Imagine a problem with two well-separable classes in a 2D feature space. A linear SVM will classify the samples by drawing a straight line for separating the two classes, such that it will maximize the distance from the nearest points of the two classes, as shown in the figure below.

svm

In more complex setting, SVM use a "soft margin" when the classes are not perfectly separable and will use the Kernel trick for drawing nonlinear decision shapes.

svm-soft

In this notebook, the SVM you will use will have a Radial Basis Function (RBF) Kernel (or Gaussian kernel), whose mathematical formulation is the following:

$K(\mathbf{x}, \mathbf{y}) = exp( \frac {\lVert \mathbf{x} - \mathbf{y} \rVert ^ 2} {2\sigma^2} )$, where often $\gamma = \frac{1}{2\sigma^2}$.

The $\gamma$ parameter controls the shape of the nonlinear decision boundary. Small values of gamma will correspond to smoother shapes around the support vectors (large standard deviation), while larger values of gamma will let the decision boundary to follow more and more the support vectors (small standard deviation) with the risk of overfitting the data.

Conversely, the $C$ parameter of SVM will (inversely) control the amount of regularization of the model. Large values of C will make the margin to be hard, with little or no flexibility for misclassified samples, while smaller values of C will let the margin to be soft and relax the maximal margin condition in the optimization function.

In this case, you will have these two free parameters to adjust for the classification in the second part of the exercise.

In addition, the data are scaled in order to better condition the underlying optimization problem.

N.W.: do not forget to scale (or inverse-scale) the data for avoiding inconsistent results!

We can thus instantiate a Support Vector Classifier (SVC) object and train it with the training data (fit).

We choose for you $C=100$ and $\gamma$ inversely proportional to the dimensionality of the data (use the keyword parameter gamma='auto'). There are pretty interesting motivations behind the tuning of the $\gamma$ parameter, but they go beyond the scope of the exercise and therefore will not be addressed.

N.W.: for this section, do not forget to change the value of $\gamma$ according to the dimensionality of the data (e.g., if you are classifying by using only 5 bands, therefore the value of $\gamma$ should be 1/5).

Once fitted, the model can be used in inference mode, that is it can be used for classifying new pixels. We use it first for classifying the whole image. Then we use it for evaluating the test pixels and on that set of samples we compute also the Overall Accuracy, which represents the fraction of well classified samples.

We can display the classification result.

5.3 Performance assessment

If we only analyze the overall accuracy (OA) and we only look at the classified image, we could achieve misleading results, even if the image looks fine "by eye" and the OA is close to 100%, as in this case.

A useful tool for inspecting the complete classification result is the Confusion Matrix. The generic element $C_{i,j}$ of this matrix represents the number of samples that is known to belong to class $i$ and has been predicted as belonging to class $j$. We can now note that the overall accuracy is, in fact, the sum of the element on the diagonal of the confusion matrix. The out-of-diagonal elements are all those misclassified samples which were assigned to a class $j$ but belonged to class $i$ or vice-versa (see the examples in the cells below for better understand).

\ \ From the confusion matrix we can define further indexes, which are general concepts but we'll particularize them to our case.

  1. Producer's accuracy (recall): defined as the fraction of true positives over the sum of true positive and false negatives: $R = \frac{TP}{TP + FN}$. It describes the "omission errors", that is all the cases in which the classifier has failed to apply the correct label.
  2. User's accuracy (precision): defined as the fraction of true positives over the sum of true positives and false positives: $P = \frac{TP}{TP + FP}$. It describes the "commission errors", that is all the cases in which the classifier applies the wrong label.

The recall is also called "sensitivity": it measures the fraction of positives which are correctly identified. The precision is also called "specificity": it measures the fraction of negatives which are correctly identified.

An example of interpretation of these two indexes is provided just below.

In a real scenario, both of these two indexes cannot be both high at the same time, although it is ideally what we would like.

\ Both these two indicators are complementary to the overall accuracy. Their harmonic mean is called F1-score (or F-score) $F = \frac{2}{\frac{1}{R} + \frac{1}{P}}$ and constitutes a more robust index for the assessment of classification results.

Let's take the class 1 (Roof) as an example. The recall is a measure of the model's ability to correctly classify the positive samples. By looking at the confusion matrix, we can look at the first row, in correspondence of the samples that we know they belong to class 1. In fact, 42 of that samples were classified as Street and 26 as Path. Therefore we have $42 + 26$ false negatives, that is samples which belong to Roof class but which were erroneously classified. This means that $R_{Roof} = \frac{3726}{3726 + 42 + 26} = 0.98$.

Conversely, if we look at the Path class, we know that 134 samples were correctly classified, but if we look at its column, we see that 26 samples have been classified as Path but they belonged to the Roof class. Therefore, this 26 samples are false positives, that is samples that have been classified as correct but they were not. This means that $P_{Path} = \frac{134}{134+26}=0.83$.

5.4 Further analyses

Perform the same procedures but considering only

  1. a single band (arbitrarily chosen)
  2. the three bands used for the true color composition and
  3. the three bands chosen for the false color composition.

Comment the results.

\ Look again at the plot of the standard deviation of the bands. Perform a classification excluding ranges of bands with low (high) standard deviation.

\ [Optional] Perform now a classification by using the data on which you applied PCA.

[Optional] Perform some of the tests already done considering a different number of training samples (change the second input parameter of gen_train_test).

[Optional] Perform further test by changing:

6. Spectral-spatial classification

Till now the classification performed on the data was only relying on the spectral characteristics of the pixels. No exploitation of the spatial arrangement of the pixels was done. In order to take this complementary source of information into account, we can consider “spatial features”: features that model the spatial information (e.g., spatial correlation of neighboring pixels, geometrical characteristics of the objects, etc). Once computed, these features can be considered by a classifier. There are many ways to extract spatial features. In this lab session we will consider the features derived by an Extended Morphological Profile (EMP) (see the paper ClassificationEMP.pdf in the doc folder for more information). The EMP is composed by a concatenation of Morphological Profiles (MPs). An MP is a sequence of opening and closing by reconstruction with a structuring element of fixed shape and increasing size and can be considered as a multiscale representation of the image. A representation is given in the image below.

The EMP is obtained by concatenating different MPs each one computed on one of the first Principal Components of the image (see figure below).

The MP is implemented by the function morph_prof in the file utils.py.

The EMP is obtained by considering the first principal components of the image. The example below shows how to build an EMP considering the first 3 PCs, with MPs computed with circular structuring elements with initial radius 1, increment step size of 3 and 4 levels (thus, leading to an MP of 9 = 4 closings + original image + 4 opening)

The EMP images are smoothed out version of the orignal projection and of invert orignal projection.

Perform a classification considering the EMP. Note that the EMP, after its computation, is a stack of images. Thus, it should be converted in vector format for obtaining a vector whose shape is (n_rows * n_columns, num levels of EMP).

7. Unsupervised analysis

Let us suppose now that no prior information on the land cover classes of interest are available. In this case one could perform an unsupervised analysis in order to identify if there are some groupings/clusters in the data. Data belonging to the same cluster share some common properties, e.g., the type of land cover (hopefully). Conversely to supervised classification, the semantic meaning of these classes should be determined a posteriori once the clusters are found.

Question Run the unsupervised clustering algorithm k-means on the data. Look at the obtained results (e.g., spectrum corresponding to the mean of each cluster and spatial arrangement of the class).

Question [optional] Run the same analysis considering both the spectral and spatial features as done for the supervised classification done in the previous part.

8. Spectral unmixing

We now consider the problem of spectral unmixing. The goal is to obtain the spectral signatures, called endmembers, of the materials present in the image, and then, estimate their fractional abundances. We will apply some well known spectral unmixing algorithms to synthetic and real datasets. In particular, we will consider algorithms that assume a linear mixing model (LMM). Some reference papers are in the folder doc.

Linear mixing model

The figure shows a graphical representation of the linear mixing model widely employed to study the formation of hyperspectral images and their spectral unmixing. The incident light reflected by the surface (displayed in the zoom-in), is collected by the hyperspectral sensor as a signal at each wavelength. The LMM assumes that the surface can be divided in homogeneous regions of different materials. The light beams bounce on these separate patches and go to the sensor without any additional interaction. The values obtained by the sensor correspond to the sum of the light of each beam.

In a formal way, the hyperspectral pixel, $\mathbf{r}\in\mathbb{R}^L$, where $L$ is the number of spectral bands, is formed as a linear combination of the spectral characterization of the materials weighted by their relative abundance and additive noise: \begin{equation} \mathbf{r} = \sum_{p=1}^{P} \mathbf{e}_p a_p + \mathbf{n}, \label{eq:lmm} \end{equation} where $\mathbf{e}_p\in\mathbb{R}^L$ denotes the spectral signature of the $p$-th material, and $a_p\in\mathbb{R}$ denotes the spatial abundance of the respective material. Normally, the spectral signature corresponds to the radiance or the reflectance properties of the materials. The fractional abundances are always non-negative (since it is meaningless to have a negative abundance of any material). Another usual constraint is to assume that the abundances of all the material present in a hyperspectral pixel sum up to one: \begin{equation} \sum_{p=1}^P a_p = 1. \label{eq:ASC} \end{equation}

The LMM is a simple approximation model. In real life, the materials in the surface are not separated in homogeneous regions and the signal collected by the sensor is non-linear. The non-linearities are due to the intimate mixture of the materials that leads to multiple scatterings as it is shown in the figure below.

However, the LMM is a good approximation, that serves well to many applications, and presents very nice analytical properties.

8.1 Spectral library

We will explore a small library of spectral signatures of materials provided by the U.S. Geological Survey laboratory (http://www.usgs.gov/). Researchers at the U.S; Geological Survey Spectroscopy Lab have measured the spectral reflectance of hundreds of materials in the lab, and have compiled a spectral library known as the USGS Spectral Library (http://speclab.cr.usgs.gov/spectral-lib.html). The libraries are used as references for material identification in remote sensing images.

Question Load and display some spectra in the USGS library. and show the spectrum of the mixture of two materials with different coefficients. Some of the materials present high values of reflectance in most of the wavelengths (bright reflectance) while others absorb or transmit most of the energy (low reflectance). Plot the spectrum of some materials and look at their reflectance values (you can notice intervals of the wavelengths in which the material reflects, absorb etc).

The spectral signatures of the materials could be affected by illumination inhomogeneities, (e.g., shadows), and by the angle of incidence of the light beams (i.e., by the geometry of the surface such as the presence of plains, valleys, mountains, etc). In any case, the shape of the spectral signature of a given material should not be substantially modified. In general, we are more interested in the shape of the spectral signature than in its amplitude.

Question Group (cluster) the materials according to their pair-wise Euclidean and angular distances}: do not spend much time on this. For instance, for each endmember, you can divide the pair-wise distances in three groups, the close range one for values between 0 and 0.1, the middle range one for values between 0.1 and 0.2, and the far range one for values greater than 0.2.

Question Simulate a simple model producing a modification of the spectral signature due to a change in illumination as a (random) \textit{multiplicative scaling factor} that applies to each endmember in the library. Compute the pair-wise Euclidean and angular distances of the scaled endmembers and compare them to the distances computed on the original set of endmembers. Make some comments.

Finally, we will simulate data according to the LMM. For a given number of materials, $P$, a hyperspectral pixel can be simulated by linearly combining $P$ random endmembers selected from the USGS spectral library using Eq.~\eqref{eq:lmm}. The goal here is to get familiar with the LMM.

8.3 Endmembers estimation

Consider the estimation of endmembers by a geometric approach such as Vertex Component Analysis [Jose M. P. Nascimento and Jose M. B. Dias, "Vertex Component Analysis: A Fast Algorithm to Unmix Hyperspectral Data" submited to IEEE Trans. Geosci. Remote Sensing, 2004]. An implementation of VCA in python is available at:# Your code here https://github.com/Laadr/VCA

Question Run VCA and discuss the results obtaines. Can you recognize any material identified as endmembers? How to chose the number of endmembers to extract?

8.4 Abundance estimation

Once we have estimated the endmembers from the data, we follow by estimating their corresponding fractional abundances. In order to do that, we define an optimization problem: \begin{equation} \mathbf{a}^{*} = \arg\min_{\mathbf{a} \geq 0} \left\| E\mathbf{a} - \mathbf{r} \right\|_{2}^{2}, \label{eq:opt1} \end{equation} where $r\in\mathbb{R}_{+}^{L}$ denotes a data sample, $E\in\mathbb{R}_{+}^{L \times P}$ denotes the estimated endmembers and $\mathbf{a}\in\mathbb{R}_{+}^{P}$ denotes the abundances. The constrained optimization problem can not be solved analytically, but it has an unique solution that can be obtained by performing a non negative least squares.

The solution to the above optimization problem does not take into account the sum-to-one constraint. In order to do that, we need to solve the following optimization problem: \begin{equation} \mathbf{a}^{*} = \arg\min_{\mathbf{a} \geq 0} \left\| E\mathbf{a} - \mathbf{r} \right\|_{2}^{2},~\textrm{s.t.}~\sum_{p=1}^P a_p = 1. \label{eq:opt2} \end{equation} Solving the above equation can be easily done by \texttt{lsqnonneg} function, making the following changes: $\tilde{E} = \left[E; \mathbf{1}_P\right]$ and $\tilde{\mathbf{r}} = \left[ \mathbf{r}; 1 \right]$.

The procedure presented here is based on the estimation of the endmembers from the data and the retrieval of their abundances. We consider now an alternative approach. Let us assume to have at our disposal a spectral library containing potentially many spectra. The goal is to estimate the abundances from such large pool of endmembers. The problem we face in this case, is that when solving the inversion problem with an endmember matrix with many entries usually leads to a solution with a reconstruction error very low since being the (intrinsic) dimensionality of the data lower than the number of spectra in the library. In order to avoid solutions that are acceptable from a representation point of view (low reconstruction error) but with not much real significance a general rule is to force that only few endmembers are active for each pixel. This accounts for an estimation of the abundances in which only few abundances are different from zero for each pixel. This constraint can be imposed in the abundance estimation algorithm by imposing that abundances are sparse (contains many zeros).

The estimation of the abundances considering different constraints (e.g., non-negativity, sum to one and sparsity) can be performed by the algorithm: Sparse unmixing via variable splitting and augmented Lagrangian methods (SUNSAL). A python implementation of SUNSAL is available at: https://github.com/Laadr/SUNSAL

Question Estimate the abundances considering a set of known endmembers (e.g., estimated in the previous part or coming from the spectral library)

8.5 Joint estimation of endmembers and abundances

Consider now the joint estimation of endmembers and abundances with constraints of non-negativity. This problem can be approaches by Non negative matrix factorization (NMF). An implementation of NMF is available in scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)

Question Compute spectral unmixing by considering NMF. Compare the obtained results with the results in the previous parts.

Question [optional] Test the different approaches tested so far on different images (e.g., cuprite).

Appendix $-$ data

The hyperspectral image of Washington DC mall will be used in this lab section (Figure 5) . The image was acquired by the airborne mounted Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor over the the Mall in Washington, DC. The image comes along with MultiSpec (a freeware multispectral image data analysis system developed at Purdue University) and is available at https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html. It is provided with the permission of Spectral Information Technology Application Center of Virginia who was responsible for its collection. The sensor system used in this case measured pixel response in 210 bands in the 0.4 to 2.4 μm region of the visible and infrared spectrum. Bands in the 0.9 and 1.4 μm region where the atmosphere is opaque have been omitted from the data set, leaving 191 bands. The data set contains 1208 scan lines with 307 pixels in each scan line with a spatial resolution of about 2.8 m. Seven thematic land cover classes are present in the scene: Roofs, Street, Path (graveled paths down the mall center), Grass, Trees, Water, and Shadow.

Below the Washington DC Mall data set is shown (false color composite).

dc